POPULATION GENETICS OF GENE FUNCTION 

Ignacio Gallo 

Abstract 

This paper shows that differentiating the hfetimes of two phenotypes independently from 
their fertiUty can lead to a qualitative change in the equilibrium of a population: since 
survival and reproduction are distinct functional aspects of an organism, this observation 
contributes to extend the population-genetical characterisation of biological function. To 
support this statement a mathematical relation is derived to link the lifetime ratio T1/T2, 
which parametrizes the different survival ability of two phenotypes, with population vari- 
ables that quantify the amount of neutral variation underlying a population's phenotypic 
distribution. 

During the last decade experimental research has begun using population-genetical princi- 
ples to link the function of genes to their population distribution [121 HSl [U] . This closely 
parallels the way in which statistical thermodynamics links the macroscopic state of a 
physical body to the activity of the molecules that form it, and is likely to bear similarly 
significant consequences. 

Population genetics has studied the mathematical relation of evolutionary and demo- 
graphic forces to the population distribution of alleles for more that a hundred years, 
and bioinformatics has long used conserved sites in comparative statistical data to infer 
function [13j. More recent works can therefore be seen as a joint maturation of these 
long-standing research threads, which enables to pose questions about the relation of 
gene function to gene distribution in a more detailed manner than previously possible: 
the analogous maturation which gave birth to statistical thermodynamics consequently 
allowed to infer the size of a molecule from observable dynamics. 

One important point that has been stressed in recent works is that demographic forces 
can mimic the effect of selection during evolution, so that it is necessary to account 
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for demography when inferring selection from population statistics. In |T6] the authors 
consider the effect of a changing population size, and show that a population's pattern of 
variation may be used to infer this type of demographic information, as well as to locate 
functionally important genetic sequences. 

The present paper considers a further possibility granted by this approach, which is 
to use the different effects of demography and reproductive selection on gene distribution 
in order to infer more detailed information about gene function itself. 

It needs to be stressed that, for the sake of exposition, this paper uses the term "gene 
function" as relating to a univocally defined concept, while yet acknowledging that this 
concept clearly doesn't exist as such. One good way to justify such simplistic use lies in 
the analogy with the term "the meaning of a word", which similarly can sometimes be 
put to good use, while suffering from an analogous ambiguity of definition. 

Life-expectancy, a demographic parameter, is related to function: it reflects, statis- 
tically, the ability of an organism to perform the tasks which are required in order to 
survive for certain amount of time. If life-expectancy is systematically different for two 
phenotypes in a given environment, it is natural to conclude that this difference is due to 
the different way in which the two phenotypes function in such environment. 

Therefore if we consider survival and reproduction to be two macro-functions per- 
formed by any living organism, detecting a difference in life-expectancy -separately from 
a difference in fertility- must allow to get a two-dimensional description of the gene func- 
tion which underlies the phenotypic change. This suggests that including the effect of 
demographic forces may not only be used to infer information about a species' demo- 
graphic history, but also to get more detailed information regarding the specific function 
that a gene is playing in a given environment, using only information which characterises 
the population as a whole. 

This paper shows that life-expectancy can affect the nature of a population's phe- 
notypic distribution in a way which is qualitatively distinct from the effect of fertility. 
It is also shown that if a population contains two phenotypes characterised by different 
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life-expectancies Ti and T2, then the ratio 

T2 

can be quantitatively estimated through the phenotypes' respective amounts of neutral 
variation, independently of all other parameters. 

Since a population's dynamics is typically dominated by differences in fertility, stan- 
dard models tend to combine the effect a genetic change has on an organism's survival 
ability with its effect on fertility [U [T2] . In this paper we study the effect of a difference in 
survival explicitly: we find that this effect is indeed small, but that it leads to qualitative 
changes in a population's equilibrium regime, which are likely to generate statistically 
observable signals at a population as well as at a comparative level. 

The paper is organised as follows: in the next section we describe the empirical justifi- 
cation for the structure of the chosen model, and we then present the model in section 2. 
Our modelling framework consists of two levels: one level studies the equilibrium reached 
by two available phenotypes, and is analysed in section 3. It is at this level that we observe 
a qualitatively novel equilibrium regime that results from differentiating the phenotype 
lifetimes separately from their reproductive rates. 

Section 4 focuses on the second level of description, which corresponds to the amount 
of neutral variation characterising each of the two considered phenotypes. We are inter- 
ested in this variation because it allows to express the parameter A = T1/T2 in terms of 
statistically observable data. 

We conclude the paper by considering the practical limitations of the derived results 
and briefly outlining possible further developments. 

1 Empirical background 

We want to construct a stochastic model for the dynamics of a population consisting of 
two phenotypes which differ both in their average amount of offspring (which we call Wi 
and W2, respectively), and in their average lifetimes (Ti and T2). To this end, in section 2 
we modify the haploid Moran model by a qualitative change in its process, which we 
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Figure 1: Structure of the considered population: organisms carry one of two phenotypes, 
Pi and P2, each of which is coded by many synonymous genotypes that are assumed to be 
neutral with respect to each other. 

attempt to justify on intuitive grounds, and which we parametrize in terms of a novel 



Since the model includes one more parameter than the standard setting, it is desir- 
able to correspondingly expand the number of independent quantities that we expect to 
observe, so to allow the discrimination among possible causes of specific states of the 
population. We address this need by considering the amount of synonymous variation 
included in each of our two phenotypes, which, following [TSj, we consider to be neutral: 
we therefore have a population consisting of two phenotypes coded by a larger number of 
genotypes, which are neutral with respect to each other as long as they give rise to the 
same phenotype (Figure [T]). 

This population structure parallels the structure of variation encountered by Martin 
Kreitman when studying the alcohol dehydrogenase locus in D. melanogaster, where he 
found that only one of the 43 polymorphic sites observed in his sample led to a change in 
the protein coded by the gene, thus revealing that the gene consists of only two molecular 



parameter, A 



1 



T2 



4 



phenotypes coded by a larger number of genotypes [TO] . 

Though the model we consider is clearly much too simplistic to apply to a natural 
population of Drosophila, Kreitman's observation gives empirical justification for building 
a model which allows only one of the sites of a long genetic sequence to lead to a change 
in phenotype, which would a priori seem to be a very strong assumption. 

We will see that by using such simplification -which is further supported by more 
extensive studies [1]- our model allows both to derive a clear characterisation of the 
phenotypic equilibria (section 3), and to estimate the model's novel parameter A (section 
4) in terms of statistically observable quantities, by using variations of standard results. 

2 Modelling approach and its relation to the stan- 
dard setting 

The structure of our model reflects the empirical situation described in the last section: 
we consider a population of organisms carrying genotypes of length L, where each 
genotype-site can take two possible states: however, only one of these sites corresponds to 
a change of phenotype, whereas mutations in other sites are neutral. It is worth pointing 
out that this is the same structure of the Drosophila haplotype data included in the 
appendix of [1], where only mutations in one specific molecular-marker site lead to an 
amino acid change. 

The model therefore includes two levels of description: a phenotypic one, which de- 
scribes the dynamical change in the number of individuals carrying the two available 
phenotypes, and a genotypic level that characterises the amount of neutral variation 
available for each of the two phenotypes. 

We use the symbols Pi and P2 to denote our two phenotypes, and we keep the pop- 
ulation size fixed at a value N: we can therefore specify the phenotypic state of the 
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population by a random variable X which gives 

X : number of individuals with phenotype Pi, 
N — X : number of individuals with phenotype P2. 

We are interested in studying the stochastic equilibrium reached by a process of death, 
reproduction and reversible mutation, where mutation happens with the same probability 
u in both directions, and for all the available sites, including the phenotypically-linked 
one. 

The assumption of a mutation rate which is both symmetric and site-independent is 
very idealised, and even more importantly, an equilibrium due to reversible mutation is 
not generally considered to be relevant for generic mutations [S]. 

Due to its simplicity, however, the chosen setting allows to show with remarkable 
clarity that an explicit difference in the phenotype lifetimes leads to a qualitative novel 
type of equilibrium state: this is the aim of this paper, since it is in this novelty that we 
see potential to extend the standard population-genetical characterisation of biological 
function; having a full characterisation of this elementary case should be useful when 
considering more realistic and analytically challenging situations. 

In the next subsection we therefore describe a variation of the Moran process which, 
as we will attempt to justify on intuitive grounds, provides a good model for pheno types 
that are allowed to differ independently in lifetime and average number of offspring. It 
is interesting to point out that Moran introduced in Ref. [T2j a variation of his original 
model that implements reproductive selection by differentiating his alleles' lifetimes while 
keeping the instantaneous reproductive rates the same for all his phenotypes (thus dif- 
ferentiating their life-long reproductive yields): Moran remarks that considering lifetime 
and reproductive differences separately would almost certainly make no difference for the 
equilibrium distribution. Our implementation of the phenotypic difference comes as a 
natural extension of Moran's, and our claim is that, though the subtlety of this exten- 
sion's effect roughly confirms his intuition, the qualitative nature of this change provides 
considerable descriptive potential. 
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As stressed before, when introducing a quantity A to parametrize the differentiation 
in the hfetimes, it becomes desirable to extend the set of quantities that we expect to 
observe in the statistics of the population: in other words, though the quantity 

X 

X = — 
N 

fully characterises our population's phenotypic state, we can gather further information by 
looking at how the X individuals carrying phenotype Pi are partitioned into synonymous 
genotypes, and similarly for the N — X individuals carrying phenotype P2. 

A natural intuitive choice to characterise neutral variation in the two phenotypes would 
be to count the actual number of synonymous genotypes present for each: however, as 
remarked in an interesting alternative is to use the inbreeding coefficient concept. 

The inbreeding coefficient is typically used for diploid organisms, since it is defined 
as the probability that a given genetic locus is homozygous: that is, it is the probability 
that for a diploid organism chosen at random from a population, the two alleles that the 
organism contains at such locus are found to be identical. However, under the assumption 
of random mating, this turns out to be equivalent to the probability that any two alleles 
drawn at random from the population are identical, regardless of the separation into 
organisms. 

The latter definition makes the quantity relevant to haploid populations, and it turns 
out to a more analytically accessible one than the actual number of synonymous genotypes, 
at least for the estimation purpose considered in section [4], in which we generalise a result 
taken from [9]. The inbreeding coefficient also provides an effective approximation to the 
actual number of synonymous alleles, and it has been suggested that it might be more 
empirically accessible than the latter quantity [1]. 

2.1 Phenotypic level 

Here we describe the process through which we model the change in our population's 
phenotypes, and make our attempt to justify the modelling choice through intuition: the 
observable outcome of this process is analysed in section [3] 
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Studies using population genetics to infer function are typically based on the Wright 
model [m [16], which describes the stochastic change of a population at discrete non- 
overlapping generations, and it is customary to describe the allele dynamics by using a 
continuous approximation to this process. There is, however, a somewhat paradoxical 
aspect to this standard modelling approach: whereas the Wright model describes the 
population as changing in discrete generations which might last for a considerable amount 
of time, the continuous approximation requires such generations to be taken of vanishing 
duration. 

This assumption is well justified by the fact that processes are often considered to 
be taking place on an evolutionary time scale, which is much longer than the generation 
time, as well as by the fact that if one assumes organisms not to be subject to ageing, 
which is virtually unavoidable at the simplest level of description, the Wright model is 
formally equivalent to a process of death and birth [3]. 

On the other hand, the point of view of this paper is that, though suitable given 
the typical assumptions, the reliance on a discrete generation setting might hinder the 
consideration of relevant modifications: here we propose a modification to the Moran 
model that stems from features of an instantaneous event, which are prohibitively difficult 
to visualise at a generation level. 

The Moran model describes the process of change in a population consisting of two 
types of organism. A time interval in this model is defined by the occurrence of a death 
event, followed by a birth event. It is sensible, in principle, to define time through these 
events, since these change the state of the population, which is the object under study. 

There is a reason, however, to regard time as fiowing according to an external frame 
of reference, thus including time intervals during which no population events happen: 
death events may be thought to take place at a rate determined by the physiology of the 
organisms; due to competition, however, birth events may be thought to happen instan- 
taneously when the death of an individual makes a safe spot available in the environment. 

In fact, both the Wright and the Moran model set the population size to be fixed at 
a value N, which represents the environment's carrying capacity: the meaning of this 
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(b) 

Figure 2: Here we illustrate the difference between the (a) Moran process and (b) the 
process considered in this paper. At every time interval in (a) an organism is chosen to 
die, and a new one is chosen to replace it: the types of the dead and newborn organisms 
determine change in phenotype Pi 's frequency X as —1, or 1. The loop in (b) shows 

that in the present model time flows according to an external reference: this allows to 
characterise the different nature of birth and death events. 
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constraint is that a death event should be interpreted as one which vacates an environ- 
mental safe spot, and we argue that the presence of competition makes it intuitively 
admissible to model the subsequent birth event as happening instantaneously as soon as 
the environmental spot becomes available. 

As a consequence, having an external time reference allows to describe more adequately 
the interplay between the two different types of competition which characterise 1) an 
organism's struggle to survive, and thus to preserve its environmental spot, and 2) the 
reproductive struggle to occupy all spots as soon as they become available. 

According to this interpretation, all organisms in a population might be thought to be 
playing a waiting game similar to the children's game "musical chairs" where A^ — 1 chairs 
are available for A^ children to sit on when the music stops. This process has already been 
considered in an evolutionary setting in p]: in our case, rather than focusing on modelling 
the competitive game which determines the allocation of an available spot, we consider 
this allocation to happen trivially and instantaneously, and we focus on the waiting game 
itself. 

In practice, our modification of the Moran model consists of a process which allows 
at most one death-birth event per time interval rather than exactly one as in the original 
model [H] : the two diagrams in Figure |2] illustrate this difference. 

Figure [2|^a) shows the change in the phenotype frequency X during a time interval in 
the original Moran model: an individual is chosen at random from the population and 
killed, and then replaced by a new individual. The change in X is then determined by 
the phenotypic identity of the dead and of the newborn. 

Differently from Figure [2|^a), Figure [2]^b) (which describes our process) contains a 
loop at the origin of the diagram. This formalises the different nature of the death and 
birth events: at a given time instant no organisms might die; when a death does happen, 
however, a birth systematically follows instantaneously. 

This modelling choice is arbitrary: contrarily to our assumption, following a death 
competitive conflicts between organisms might lead to a substantial delay in the allocation 
of the newly-vacated spot, and this could considerably change the nature of the process. 
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Figure 3: Transition probabilities for the fundamental population events in our process: 
p~ and p~^ correspond to death and birth (after mutation) of organisms with phenotype 
Pi. Similarly we have q~ and for P2, and the existence of the loop at the origin of the 
diagram is due to that in general p^ + < I. 

This objection, however, only highhghts the descriptive potential of a modelling approach 
that uses intuition to consider fundamental population events in some detail, an approach 
for which a considerable gap exists in the mathematical biology literature: here we look 
at the consequences of a simple such possibility. 

The model is therefore defined by the transition probabilities p~, q~ , p'^ and q^ in 
Figure |3} which are in turn derived from the life-cycles of the two organisms. 

Transition probability p^ corresponds to the event that an organism with phenotype 
Pi dies, whereas q~ corresponds to the same event for phenotype P2. 

We denote the relative frequency of phenotype Pi by a; = X/N, and its average lifetime 

by Ti'. under the assumption that each organisms is reproductively mature at birth and 

is not subject to ageing, we have that 

_ X _ 1 — X 

p = — , and g = — 

J-l ^2 

and the novelty of the model corresponds to that in general we have 

p~ + q~ < 1. 

We denote by Wi the average number of offspring produced by an organism with phe- 
notype Pi during its entire lifetime, and by u the probability of mutation from phenotype 
Pi to phenotype P2 after reproduction. 
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Under these life-cycle conditions it can be shown that the birth probability for phe- 
notype Pi, conditional to the event that a death has taken place, is given by 

—— [ \ — u)x + —— u[l — x) 
+ _ Ti ^ ' T2 ^ ^ 

and, since we assume the mutation probability to be the same in both directions, we 
similarly have that for P2 

-z— UX + — U)[l — X) 

+ _ Ti T2 ^ ' 

Ti T2 ^ ^ 

Like we stressed before, a reproduction event is assumed to happen instantly when a death 
event vacates an environmental spot, which leads to 

+ = 1. 

We are interested in including parameter values for which the equilibrium distribution 
does not become trivial in the limit of large population size, and to this end we employ 
the following asymptotic scalings for the mutation parameter 

Nu — > e, 

and for reproductive selection parameter 

Vi 




1 I ^ s, 

N-¥oo 

which we use as definitions for the rescaled parameters 6 and s. For convenience we also 
use the parameter A for the ratio between lifetimes: 

^ = ^- 

In section [3] we will need the first two moments of the change in the variable X at a 
given time to write down the large population size limit for the equilibrium distribution 
attained by the phenotypes. To this end, after defining 

AX{x)=Xt+,-Xt, 
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we need to compute quantities M{x) = E [AX(a;)] and V{x) = E (AX(x)) in terms 
of the transition probabilities defined above. The functional dependence on x shows that 
this moments are computed conditionally on the relative frequency of phenotype Pi being 
equal to x = X/N: for convenience, however, from now on we drop the x dependence 
from the notation. 

Using inspection on Figure |3] we find that 

M= lim (q^p^—p^q^), 



and that 



V = lim (g p^ + p q^ 



which in terms of the asymptotic parameters gives 

^_ 1 ex^i-xy + Xsx{i-x)-ex^ 

~ nT\ x + \{1-x) ' 

and 

^ 2A x{l-x) 



NTi x + X{l-x)' 

We see that the novelty of the model is nicely shown algebraically by the presence of 
a factor {x + A(l — x)) in the denominator of both M and V, its presence in the latter 
being particularly significant for the form of the equilibrium distribution: we discuss the 
analytic consequences of this in section |3| 

2.2 Neutral variation 

Underlying the process of change in the phenotypic frequencies we have the process of 
creation of new neutral mutations to phenotypes Pi and P2, and of their stochastic loss. 

As mentioned in the introduction to this section, we model each genotype as a se- 
quence of L two-state sites, which includes a site (the "phenotypically-linked" site) whose 
mutation causes the change between phenotypes Pi and P2, and for the sake of simplicity 
we make the rather strong assumption that mutation happens with same probability u 
at all sites, and in both directions: therefore, the probability of mutation u relevant to 
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the phenotypic equilibrium also parametrizes the amount of neutral variation for the two 
phenotypes. 

Like we said before, rather than using the actual number of neutral genotypes into 
which each phenotype is partitioned, we choose to characterise neutral variation by the 
inbreeding coefficient. For a population of haploids, such as the one we consider, the 
inbreeding coefficient can be defined as the probability that two genotypes drawn at 
random from the population are identical. 

Therefore, in addition to random variable x that characterises the population's phe- 
notypic distribution, we define the two quantities 

Fi = probability that two organisms with phenotype Pi have the same genotype, 
F2 = probability that two organisms with phenotype P2 have the same genotype. 

In section |4] we find an explicit formula for the new parameter A in terms of com- 
bined moments of quantities Fi, F2 and x: in this paper's point of view such a relation 
contributes to extend the population-genetical characterisation of biological function. 

Kimura and Crow |9] find that in an "infinite alleles" model, which compared to the 
model presented here may be thought of as one where only one phenotype exists, and 
where genotypes have an infinite number of sites, the inbreeding coefficient is on average 
equal to 

(F) ^ . 

^ ' 2Nu+l 

In section |4] we generalise their calculation to include our case, and see how the result can 
be used to express the parameter A = T1/T2 in terms of statistically observable quantities. 



3 Phenotypic equilibrium 

Here we describe the equihbrium distribution attained by the population's phenotypes 
Pi and P2 under our process of selection and reversible mutation. It is worth stressing 
immediately that the qualitative novelty of including the differentiation of phenotype 
lifetimes manifests itself analytically in the probability density function of Pi's relative 
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frequency x, 

(j){x) = Ce""" x^^-\l - xf/^-^ (x + A(l-x)), 

through the factor (x + A(l —x)), which is not usually seen in population genetics models. 
When A = 1 this factor is equal to one, and we recover the typical equilibrium distribution 
for a haploid population under reversible mutation and selection. 



3.1 Form of the equilibrium distribution 

The form of the phenotypic equilibrium distribution can be obtained in a large population 
size approximation by using Wright's formula 




<P{x) = exp <J 2 / -^dx }, (3.1) 

where C is a normalisation constant. 

This formula was derived by Wright [I7j for a process divided into non-overlapping 
generations, and it has been proved by Moran to be applicable to various versions of 
his model U2j. More importantly for our case. Cannings [3] has shown by a concise 
observation that an overlapping generations model can be considered formally equivalent 
to a non-overlapping one as long as organisms are not subject to ageing, and this allows 



us to use approximation (3.1) in its full generality. 
According to the last section our model gives 

^ 1 ex^ii-xY + Xsxii-x)-ex^ 



NTi X + A(l - x) 

and 

^_ 2A x(l-x) 



NTi x + A(l-x)' 

so our model's equilibrium for Pi's relative frequency x = X/N takes the following form 

(j){x) = Ce"'' x^'-\l - xf/^-^ {x + X{l-x)), 

where 

a = s + 6'Q-A 
15 




X (relative frequency of P,) 

(c) 

Figure 4: These are the three basic types of equilibrium distribution which can be attained 
at a phenotypic level, and which correspond to different intensities of mutation in the 
following way: (a) Nu <^ 1, (b) Nu ^ 1, (c) Nu ~ 1. Regime (c) is caused by the 
differentiation of the phenotype lifetimes, and appears when A 7^ 1 . 
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and C is the normalization constant which ensures that 

1 

(f){x)dx = 1. 

We see that for A = 1 (that is, when Ti = T2) we recover the equihbrium distribution 
for a typical haploid random drift model with reversible mutation. When A 7^ 1 we 
have that the factor {x + A(l — x)) produces a qualitative difference in the shape of the 
distribution, though this only happens for values oi 6 = N u close to one: Figure |4] shows 
the shape of the equilibrium distribution for different values of the mutation parameter. 

We find that the there is an intermediate regime (Figure |4|^c)) between the typical 
low-mutation U-shape (Figure |4]^ a)) and the high-mutation bell-shape (Figure |4]^b)). The 
intermediate regime admits two stationary points, and as a consequence becomes bimodal. 
In general, we have that for any parameter value the type of equilibrium can be charac- 
terised in terms of the number of stationary points which the distribution exhibits: we 
carry out such characterisation in the next subsection. 

3.2 Diagram characterising the equiUbrium population's modes 

In the last subsection we saw how a difference in the lifetimes of our two phenotypes can 
lead to a new type of equilibrium regime for the population, which consists of a hybrid 
between the classical low and high mutation regimes: this new regime exhibits both a local 
probability maximum (as in the high-mutation case) and a maximum at the boundary 
(as in the low- mutation regime). 

Looking at the the shapes of the equilibrium distributions in Figure [4], we see that 
these shapes can be well classified in terms of the number and nature of their stationary 
points, which in turn determine the number and location of the distribution maxima, or 
modes. 

Figure |4]^a) has only one stationary point, which is a minimum, and this implies the 
existence of two modes at x = and x = 1 (the probability density in fact diverges to 
infinity at these boundary points, though this singularity does not affect the possibility of 
normalising the distribution): this characterises the regime of low mutation as one where 
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the population polarises about one of the phenotypes, and rarely switches to the other. 

Figure |4](b) also has only one stationary point, which is a local maximum: this suggests 
that the dynamics of the population in this regime will typically be one where a mixed 
phenotypic state fluctuates around this maximum. 

Figure |4|^c) shows a novelty of the present model: we have two stationary points, 
a local maximum and a local minimum, which implies the existence of a second mode 
also at one of the boundaries. This suggests that the dynamics will tend to polarise the 
population around one specific phenotype: the existence of the local maximum, however, 
suggests that this state of polarisation should be periodically lost in favour of a mixed 
configuration for the two phenotypes, and that this switch should happen considerably 
more often than the switch between the polarised states of (a). This, however, can only 
be elucidated by considering the model's dynamics explicitly, and could be the topic of a 
future work. 

In order to understand how parameter values relate to cases (a), (b) and (c), we use the 
probability distribution function we obtained in the last section to locate the stationary 
points of the distribution 0. 

The condition 

dx 

which gives the stationary points, leads to the following rational equation for x: 

a+- ^ )(x + A(l-x)) +1-A = 0, (3.2) 

X 1 — X J 



where 



6 /I 
a = A6' — 1, 6= — — 1, and a = s + 9i— — X 
A V A 



Multiplying equation (3.2) by factors x and (1 — x), we obtain a polynomial equation 
of degree 3, which admits three solutions. We are, however, only interested in solutions 
lying on the real interval going from to 1, since these correspond to meaningful values 
for the relative frequency x. 



Rather than solving the cubic in x, we can use (3.2) to find a functional expression 



for 6 = N u (for which equation (3.2) is linear). By considering ^ as a function of x, and 
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"u": probability of mutation per site "u"; probability of mutation per site 



random drift mutation/selection balance 

(a) (b) 

Figure 5: Stationary points for the equilibrium distribution (a) in the classical case where 
X = T1/T2 = 1, and (b) for X = T1/T2 = 3/2. Each line corresponds to a different value 
of the selection coefficient "s": dashed lines are for local minima, solid lines for local 
maxima. When A 7^ 1 bimodal equilibrium states exist: the dotted line corresponds to the 
value of the mutation probability "u" which gives rise to the distribution in Figure^c), 
for the value of the selection coefficient " s " corresponding to the line highlighted in red. 

looking at this function for different values of parameters s (reproductive selection) and 
A (the lifetime ratio), we get a full characterisation of the distribution's stationary points 
and, as a consequence, of its modes: we do this in Figure [5j 

Figure |5] shows the stationary points for the equilibrium distribution: the two pictures 
correspond to the two different cases A = 1 and A 7^ 1. Different lines correspond to 
different values of the selection coefficient s, for which they give the dependence of the 
position of the stationary points on the mutation probability u. Dashed lines correspond 
to local minima and solid lines to local maxima. 

Figure IsFa) shows the classical situation where Ti = T2 (A = 1). As expected, we find 
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an abrupt transition in the nature of the stationary points when the mutation probabihty 
u = l/N, at which value the equihbrium distribution turns from being U-shaped to being 
bell-shaped: however, the diagram shows that all parameter values except u = l/N lead 
to only one stationary point in the equilibrium distribution. Highlighted in red we see the 
functional dependence of the unique stationary point on the mutation probability, for a 
particular value of the reproductive selection coefficient s. 

Figure |5](b) shows the same diagram for A = T1/T2 = 3/2: we see highlighted in red the 
dependence of the equilibrium distribution's stationary points on the mutation probability, 
for the same value of reproductive selection s used in for the red curve in Figure [sj^a). The 
diagram shows how near u = l/N there are regions where two stationary points coexist 
for the same distribution, a situation which is not encountered in classical population 
genetics models for haploid populations. The dotted line, which shows an example of 
this, corresponds to the regime in Figure |4](c). 

An important fact which we learn from Figure [5] is that the diagram for A = 1 is 
not robust with respect to changes in the parameter values, whereas it is robust for any 
other value of A. This means that for any A 7^ 1 the diagram will exhibit regimes where 
the equilibrium distribution admits more than one stationary point, and the transition 
between the different qualitative types of equilibrium will come about through the same 
type of bifurcations which we see in Figure Istb). 




In this section we derive the parameter A = T1/T2 from population data, and in partic- 
ular from statistical quantities characterising the amount of neutral variation underlying 
phenotypes Pi and P2. As mentioned above, in order to quantify the amount of neu- 
tral mutation we use the inbreeding coefficient concept, by defining the following two 



Fi = probability that two organisms with phenotype Pi have the same genotype, 
F2 = probability that two organisms with phenotype P2 have the same genotype. 




4 Estimation of A 



T2 



quantities 
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An important reason for using this approach is that it allows to extend an intuitive 
result obtained by Kimura and Crow in [9J , where the equilibrium value for the inbreeding 
coefficient was computed for a population consisting of only one phenotype, rather than 
of two pheno types like in our case. 

The basic observation that allows Kimura and Crow's calculation is that, if we denote 
the inbreeding coefficient for their unique phenotype by F, and if we assume the population 
changes according to Wright's process, in the absence of mutation the value of F changes 
according to the following equation: 

F{t + l) = l-+(l-l-\F{t). (4.3) 



The intuitive reason for this is that at generation t + 1 any two individuals have a prob- 
ability of being born from the same parent: if they do, in the absence of mutation 
they share the same genotype with probability 1, which gives the first term on the left 



hand side of (4.3). 

In the presence of mutation, assuming that each mutation generates a mutation which 
previously didn't exist. 



f(( + l) = |^+(^l-^jfM|(l-a 

which leads to the the equilibrium average value 

, , l-2u 1 , , 

(F) = ^ . 4.4 

^ ' 2Nu~2u+l 2Nu+l ^ ' 

The "infinite alleles" assumption, according to which each new mutation produces a 
genotype not contained in the population, is equivalent to assuming that the length L 
of our genotype is very large, and we shall be making the same assumption in order to 



generalise (4.4). 



The line of reasoning used to obtain (4.4) can be extended to our situation, where a 
haploid population subdivided into two phenotypes Pi and P2 changes by single death- 
and-birth events, rather than at discrete generations: since this setting is less symmetric 
the details are more articulated, though the idea behind the calculation remains the same. 
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In our model the change in Fi at time t depends 1) on whether a death has taken 
place, 2) on the phenotype of the organism that died, and 3) on the phenotype of the 
newborn (taking into account the possibility that a mutation might have taken place). 

To reduce the complexity of the calculation we make the assumption that a newborn, 
before mutation, shares the same phenotype of the organism which has died. This would 
only strictly hold if the relative sizes for phenotypes Pi and P2 stayed a fixed throughout 
the process: in Figure |6] we show, however, the result of a simulation that suggests that 
our assumption leads to a formula which gives a rather precise approximation, and this 
is sufficient to support the paper's claim that it is theoretically feasible to extend the 
population-genetical characterisation of biological function. 

To compute the change in Fi we therefore need to consider three cases for the type of 
event taking place at time t: 

A: an organism of type Pi dies, is replaced by a newborn of the same type, and the 
newborn might mutate, either at the phenotypically-linked site, or at one of the 
neutral sites, 

B: an organism of type P2 dies, is replaced by a newborn of the same type, and the 
newborn might mutate, either at the phenotypically-linked site, or at one of the 
neutral sites, 

C: no death takes place, so no replacement happens. 

According to our process, and taking into account our simplifying assumption -that 
sets the phenotype of a dead organism equal to that of the subsequent newborn- the 
probabilities of events A, B and C are: 

J-l -'2 J-l J-2 

Keeping in mind that the quantity Pi is defined as the probability that two organisms 
chosen at random from the population and which have phenotype Pi also share the same 
genotype, we now need to find how Pi changes in each of the three cases A, B and C. 



22 



Preliminary calculation of sampling probabilities: 

The probability Fi{t + 1) is associated with a couple of organisms drawn at random 
from the population, and therefore it depends on whether one of the two sampled organ- 
isms happens to be the one which was born during the last step. In particular, we need 
to distinguish the following three sampling events: 

: the newborn organism is chosen in the sampling, but its parent is not, 
S2 : the sampled couple consists of the newborn and its parent, 
Ss : the newborn is not sampled. 

Assuming that the organisms are sampled from the population without reinsertion, the 
probabilities of events Si, S2 and S3 are as follows: 

p(50 = 4. ^ 



X X-1 X{X-1)' 

^^^'^ = x{x-iy 



X x-1 X{X-1)' 

where, like before, X is the total number of individuals with phenotype Pi. 

Using these probabilities we can now find Fi{t + 1) in each of the three cases A, B 
and C 

Case A: 

To see how Fi changes after an event of type A wc need to know two things: 

- whether the newborn mutated (and whether the mutation happened at the site 
linked to the change in phenotype), 

- whether one of the two organisms which are selected at random to compute the 
probability Fi is the newborn (and whether the other happens to be its parent 
organism) . 

If a mutation happens at the phenotypically-linked site, any two organisms of type Pi 
sampled at time t -\- 1 will have the same probability of sharing the same genotype, as 
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they did at time t: since the probabihty of mutation at any site is u, this will contribute 



to the average value of Fi at time t + 1, conditional to event A. 

If a mutation happens at a neutral site, the probability of the newborn having the 
same genotype as any of the other organisms is zero (this is a consequence of assuming 
that L is large enough for each neutral mutation to produce a totally new genotype). 
Since the probability of a neutral mutation is (L — 1) u, we have that in this case the 
contribution is 



The first term on the left-hand-side corresponds to the event that the newborn is chosen 
at the sampling (events 5*1 and 5*2 above), whereas the second term, which gives the non- 
zero contribution, corresponds to the fact that the probability Fi remains unchanged as 
long as none of the chosen organisms is the newborn (event S3). 

Finally, for the case in which no mutation happens at any site, which has probability 
{1 — Lu), we get the following contribution: 



where the second term corresponds to the fact that, as long as no mutation takes place, 
the probability that the newborn shares its genotype with its parent is equal to 1. 

Therefore, if we denote the value of Fi{t + 1) conditional to event A by Fi{t + 1\A), 
summing all three contributions we get 



F,{t + l\A)^uF^{t)+F,{t)iL-l)uPiSs) + {l-Lu){F^{t)■{PiS,)+PiSs))+P{S2)}. 



In this case we have a newborn of type P2. Like for case A, we use the term Fi{t + 1\B) 
to the denote the new value of Fi conditional to B, and we have that 



uFi{t) 



• (L - 1) « ( P{S,) + P{S2) ) + Fi(t) ■{L-l)u P{Ss) = F,{t){L -l)u PiS^), 



(1 - Lu){F^{t) ■ {P{Si) + P{Ss)) + 1 • P{S2)}, 



Case B: 
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It's straightforward to see why: if the newborn is of type P2, the inbreeding coefficient 
remains unchanged unless the newborn mutates in the phenotypically-hnked site and is 
subsequently chosen in the sampling: the probability of this event is u ■ (P(-S'i) + P(5'2)). 

Case C: 

In this case nothing happens, so we have that 

F,{t + l\C)^F,{t). 



We can now write the value of + 1) as the sum of the conditional contributions 
multiplied by their respective probabilities: 

Fi{t + 1) = Fi{t + 1\A) P{A) + Fi{t + 1\B) P{B) + Fi{t + 1\C) P{C), 

and we can use symmetry to obtain an analogous relation for F2. 

In the limit of large N these two relations simplify substantially, so that up to order 

l/iV^ we get 

where the terms x and {1 — x) at the denominator arise from the sampling probabilities 
P(S,), P{S2), P{Ss). 

Therefore, denoting by (•) the average with respect of all realisations of our process, 
we get the following relations linking the moments of the observable quantities to the 
model parameters: 

(Fi/x)+^^(a((Fi/x)-(Fi)) + (L-1)(Fi)) = 
{F2/{l-x)) + e(l{{F2/{l-x))-{F2)) + {L-l){F,)) = (1/(1 -x)). 
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Figure 6: Comparison of the actual value of X = T1/T2 and the value estimated from Eqn. 



{4-5), for simulations using values ranging from X = .5 to X = 2. The moments needed 



for Eqn. {4-5) are estimated from 10000 process realisations for each value of X; the other 



parameters are s = —5, u = 0.007, = 1000 and L = 40. 



Though the notation is somewhat cumbersome, it is easy to see that these two equa- 
tions offer a relation between the model parameters 6 and A and the averages of the six 
random quantities 

p„ i — f_ 

X 1 — X X 1 — X 

all six of which are in principle statistically observable. 

Since this paper focuses on the parameter A = T1/T2, in virtue of its putative relevance 
in terms function, we proceed by solving both equations for ^, and equating them in order 
to find a relation for A. 

In order to express the mentioned relation in a more compact form we define the 
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following auxiliary quantities 

- {F,/x) 



R 



In terms of these quantities, the equation for A takes the following form 

AQi + L - 1 
^Q2 + A(L-1)' 

and this relation leads to a quadratic equation that only admits one non-negative solution: 
A = ^{(i? - 1){L - 1) + y/{R-mL-l)' + 4RQ,Q2y (4.5) 

Figure [6] shows the result of using formula ( |4.5 ) to estimate A, for a series series of 
simulations where the real value of A ranges from .5 (i.e. Ti = 1/2 T2) to 2 (i.e. Ti = 2T2): 
we see that the average values of such estimations are well aligned with the actual values. 

The magnitude of the standard deviation for our estimations, on the other hand, is 
considerable, especially in view of the fact the 10000 realisations of the process were used 
to estimate each value of A: it is clear that a substantial increase of efficiency will be 
needed to make the theory relevant to actual empirical phenomena. 

This practical consideration should not obfuscate, however, the fact that equation 



(4.5) provides a relation between population statistics given by x, Fi and F2, and param- 
eter A = T1/T2, which contains functional information related to a gene's effect on an 
organism's ability to survive, rather than on its reproductive fitness. 



5 Outlook 

We have shown that the effect of differentiating the lifetimes of two phenotypes inde- 
pendently from their fertility leads to a qualitative change in the equilibrium state of a 
population: since survival and reproduction are quite distinct macro-functions performed 
by any living organism, this contributes to extend the population-genetical characterisa- 
tion of biological function. 
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We have furthermore shown that, by using information provided by neutral variation, 
the hfetime ratio A can be expressed exphcitly in terms of statistically observable quan- 
tities, and independently of all other parameters. This both gives some support towards 
an eventual empirically relevance of the proposed modelling approach, and suggests ob- 
servable quantities that can be useful in characterising the stochastic equilibrium of a 
population in terms of the functional features of the individuals which comprise it. 

It needs to be stressed, however, that the statistical resolution needed to estimate A 
efficiently following this method seems to go beyond what could be achieved empirically: in 
order to obtain Figure |6j 10000 realisations of the system were needed for each parameter 
value, and for each value 5000 generations were needed for the population to relax to its 
stochastic equilibrium. 

However, this study aims only to be a proof of principle, and should be only considered 
only as a "worst case scenario" , which nevertheless shows that inferring functional details 
from population genetical considerations is a definite theoretical possibility. It is left for a 
future work to assess its practical feasibility by improving the estimation efficiency, pos- 
sibly while considering dynamical statistics explicitly: it is useful to remember, however, 
that the dynamics of no system has ever been understood without a sufficient grasp of 
how relevant forces balance one another to allow observation. 
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